MDPs: Learning in Varying Environments

نویسندگان

  • István Szita
  • Bálint Takács
  • András Lörincz
چکیده

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvári and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...

متن کامل

Value Function Based Reinforcement Learning in Changing Markovian Environments

The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on t...

متن کامل

Active Reinforcement Learning with Monte-Carlo Tree Search

Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to BayesAdaptive MDPs. We provide an ARL algorithm using Monte-C...

متن کامل

Feature Dynamic Bayesian Networks

Feature Markov Decision Processes (ΦMDPs) [Hut09] are well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale realworld problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows...

متن کامل

A Novel Approach of Route Choice in Stochastic Time-varying Networks

Many exist studies always use Markov decision processes (MDPs) in modeling optimal route choice in stochastic, time-varying networks. However, taking many variable traffic data and transforming them into optimal route decision is a computational challenge by employing MDPs in real transportation networks. In this paper we model finite horizon MDPs using directed hypergraphs. It is shown that th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2002